Logo

1.1 The Research Question

Much political science research involves analysis of a dependent variable that has boundary restrictions. In electoral studies, vote share and voter turnout (aggregate-level) have a percentage measure and are naturally bounded between $0$ and $1$. (Geys, 2006; Holbrook and McClurg, 2005) In public opinion studies, the presidential approval rating is another percentage variable of the same nature. (Kernell, 1978; West, 1991) In public policy, learning achievement measured with standardized test scores like the SAT or achievement measures like grade point average are constrained by a bounded point scale. (Berger and Toma, 1994; Henry and Rubenstein, 2002) In world politics, the polity score that measures the level of democracy ranges from $-10$ to $10$. (Knack, 2004; Rudra, 2005) Examples like these abound, and these dependent variables are all distributed within a certain boundary. In econometric textbooks, the truncated regression model (hereafter the TRM model) would be a more appropriate regression method to apply.1 (Greene, 2008, 863-869)

The TRM model has been developed over more than three decades.2 It can be easily executed with statistical software, such as Stata (with the command truncreg). Previous literature shows that the TRM model has been applied in many disciplines, such as economics, astronomy, and biology (Tsai, Jewell, and Wu, 1988), but its application in the field of political science has been limited. Why do political scientists seldom use this method? What is the cost of not applying this method when the dependent variable is distributed as truncated normal?

____________________

Footnote

1 The truncated regression model is usually applied when the dependent variable has boundary restrictions. (Amemiya, 1973, 1984) The boundary restrictions can be singly bounded at a lower or upper limit, or doubly bounded within an interval. (Johnson et al., 1970) When such truncation reflects the essential feature of the distributional assumption, the truncated regression model differs from the censored regression, such as the Tobit model (Tobin, 1958), in two aspects. First, truncated regression does not allow any observation outside the boundary, including dependent and independent variables. Censored regression, on the other hand, does have observations outside the boundary, but the values of the dependent variable are all collapsed into the boundary values. (Greene, 2008, 869) Second, given the different nature of truncation, the probability density function is also different for the two models. For truncated regression, the pdf function is simply the untruncated normal density divided by a probability measure from the lower to upper limits. For censored regression, the pdf function is a mixture of discrete and continuous distributions in which the former captures the censoring mechanism, and the latter remains as the same as the uncensored case. (Breen, 1996, 4)

2 We can further distinguish the TRM model from the censored regression and the Heckmen model in terms of a data generating process. For truncated regression, it only needs a distributional assumption, but censored regression contains a distributional assumption and a censoring mechanism. The same distinction can be made about the sample-selected model, such as the Heckman model (Heckman, 1979), in which the data is only available when the criterion of another variable is satisfied. (Sigelman and Zeng, 1999, 177) While observed values of the dependent variable in the three models are all distributed as truncated normal, the censored and sample-selected models have different working assumptions for the dependent variable. The censored model assumes an underlying untrucated normal distribution, plus the censoring mechanism that confines the dependent variable within a certain range. Similarly, the Heckman model assumes a bivariate normal distribution of the error terms for the selection and outcome regressions with a correlation coefficient. (Sartori, 2003, 114) The empirical truncation of the outcome dependent variable depends on the selection mechanism that is specified in the Heckman model. Apparently, the censored and sample-selected models do not directly assume the dependent variable as univariate truncated normal, and they both add an additional assumption to the data-generating process that might not be true.

Download [full paper] [supplementary materials] [all replication files] [technical note]